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Abstract: With the increase of exploration depth, it is more and more difficult to find Au deposits. Due to 
the limitation of time and cost, traditional geological exploration methods are becoming increasingly difficult 
to be effectively applied. Thus, new methods and ideas are urgently needed. This study assessed the feasibility 
and effectiveness of using hyperspectral technology to prospect for hidden Au deposits. For this purpose, 48 
plant (Seriphidinm terrae-albae) and soil (aeolian gravel desert soil) samples were first collected along a sampling 
line that traverses an Au mineralization alteration zone (Aketasi mining region in an arid region of China) and 
were used to obtain soil Au contents by a chemical analysis method and the reflectance spectra of plants 
obtained with an Analytical Spectral Device (ASD) FieldSpec3 spectrometer. Then, the corresponding 
relationship between the soil Au content anomaly and concealed Au deposits was investigated. Additionally, 
the characteristic bands were selected from plant spectra using four different methods, namely, genetic 
algorithm (GA), stepwise regression analysis (STE), competitive adaptive reweighted sampling (CARS), and 
correlation coefficient method (CC), and were then input into the partial least squares (PLS) method to 
construct a model for estimating the soil Au content. Finally, the quantitative relationship between the soil Au 
content and the 15 different plant transformation spectra was established using the PLS method. The results 
were compared with those of a model based on the full spectrum. The results obtained in this study indicate 
that the location of concealed Au deposits can be predicted based on soil geochemical anomaly information, 
and it is feasible and effective to use the full plant spectrum and PLS method to estimate the Au content in 
the soil. The cross-validated coefficient of determination (Rô) and the ratio of the performance to deviation 
(RPD) between the predicted value and the measured value reached the maximum of 0.8218 and 2.37, 
respectively, with a minimum value of 6.56 ug/kg for the root-mean-squared error (RMSE) in the full 
spectrum model. However, in the process of modeling, it is crucial to select the appropriate transformation 
spectrum as the input parameter for the PLS method. Compared with the GA, STE, and CC methods, CARS 
was the superior characteristic band screening method based on the accuracy and complexity of the model. 
When modeling with characteristic bands, the highest accuracy, R? of 0.8016, RMSE of 7.07 ug/kg, and RPD 
of 2.20 were obtained when 56 characteristic bands were selected from the transformed spectra (1/InR)' 
(where it represents the first derivative of the reciprocal of the logarithmic spectrum) of sampled plants using 
the CARS method and were input into the PLS method to construct an inversion model of the Au content 
in the soil. Thus, characteristic bands can replace the full spectrum when constructing a model for estimating 
the soil Au content. Finally, this study proposes a method of using plant spectra to find concealed Au deposits, 
which may have promising application prospects because of its simplicity and rapidity. 
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1 Introduction 


Soil geochemical prospecting involves studying the regularity of the dispersion and concentration 
of metallic elements in the soil and the corresponding relationship with ore bodies in the bedrock. 
This approach is based on systematic measurements of the distribution of metallic elements in the 
soil and then prospecting for hidden ore bodies by identifying chemical anomalies and interpreting 
evaluation anomalies (Yang et al., 2018). Soil geochemical methods have been widely used in 
geological prospecting because they offer advantages such as strong reliability, low 
cost, simple implementation, and short operation period (Arias, 1996; Smee, 1998; Smith et al., 
2011; Wang et al, 2013; Timofeev et al, 2016). However, in traditional soil geochemical 
prospecting, the metal content in the collected soil samples is mainly measured by laboratory 
chemical methods (Von Steiger et al., 1996; Kemper and Sommer, 2002); although the accuracy 
is high, this process is time consuming and laborious, and the heavy metal content in the soil is 
only determined in a small area (Yousefi et al., 2018; Han et al., 2020). Additionally, it is difficult 
to obtain continuously distributed information related to heavy metals at large scales, resulting in 
high levels of uncertainty regarding the exploration of concealed deposits (Liu et al., 2019). 

The emergence of hyperspectral remote sensing technology has provided new methods 
to macroscopically and rapidly acquire soil heavy metal information over large areas (Sun and 
Zhang, 2017; Shi et al., 2018). Hyperspectral remote sensing provides multiband, high-spectral- 
resolution, and continuous results. Thus, continuous soil spectra in the visible, near-infrared, and 
mid-infrared ranges can be obtained (Cheng et al., 2019). Additionally, heavy metal elements are 
usually adsorbed or hosted in clay minerals, iron oxides, and organic matter, and exhibit relatively 
significant spectral characteristics in the soil spectrum; therefore, hyperspectral techniques can 
potentially be used to remotely retrieve information related to metal elements in the soil (Vega et 
al., 2006; Sun et al., 2018). Scholars have increasingly used mathematical regression methods to 
establish the quantitative relationships between heavy metals and spectral characteristic 
parameters (Kooistra et al., 2001; Kemper and Sommer, 2002; Moros et al., 2009; Chakraborty et 
al., 2015; Sawut et al., 2018; Yousefi et al., 2018). Moreover, many researchers have predicted 
the spatially continuous distribution of heavy metals in the soil (Tan et al., 2021; Wang et al., 
2021). 

Although the above studies have verified that hyperspectral techniques have considerable 
potential for predicting the metal content in the soil, it is extremely difficult to use these methods 
to retrieve the metal content in the soil because of interference from plants in plant-covered areas. 
However, many studies have shown that several plants can absorb metal elements from the soil 
through their roots during growth (Bandaru et al., 2016; Cui et al., 2021). When heavy metal 
elements excessively accumulate in plants, the chlorophyll content, cell structure, and water vapor 
content of the plants change, resulting in changes to their reflectance spectra (Hoque and Huntzler, 
1992; Dunagan et al., 2007; Ren et al., 2008), which can provide a theoretical basis for the indirect 
inversion of metal element contents in the soil. Some scholars have successfully estimated the 
heavy metal content in the soil beneath plants by using plant spectra. For instance, Kooistra et al. 
(2003) found that a vegetation index (Modified Soil-Adjusted Vegetation Index 2 Model 
(MSAVI2)) constructed based on the reflectance spectra of perennial ryegrass (Lolium perenne) 
could be used to characterize the level of Zn pollution in the soil. Shi et al. (20162) studied rice and 
found that a three-band vegetation index displayed greater potential than a two-band vegetation 
index for estimating the metal content in the soil because of its ability to use more spectral 
information. Among all the various indices, the three-band vegetation index ((R716—Rs6s)/(Rs52 
—Rses) (where R716, Rses, and Rss? represent the reflectivity at 716, 568, and 552 nm wavelengths, 
respectively) is considered the optimal index for estimating soil metal contents. The above studies 
have shown that plant spectra and indices 
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constructed based on spectra can be used to effectively monitor the heavy metal content in the soil 
in plant-covered areas. 

Arid and semi-arid regions in China is rich in mineral resources, but it is difficult to find valuable 
mineral deposits there due to the presence of thick Quaternary formations. Thus, new prospecting 
methods and ideas are urgently needed. A small half-shrub species called Seriphidium terrae-albae 
is widely distributed in the mining area, which encompasses the Aketas gold-copper mine, the 
Kalatongke copper-nickel mine, the Xilekudute copper-molybdenum mine, and the Ketebieteti 
copper-nickel mine. Scholars have verified that the plant is able to absorb Au from the soil without 
hindrance and displays a geochemical anomaly (Song et al., 2016; Song et al., 2017). Such an 
anomaly may provide a reference for the exploration of hidden Au deposits. Although extensive past 
research has demonstrated that plant spectra can be used to accurately estimate the contents of metals 
in the soil (Kooistra et al., 2003; Shi et al., 2016a, b), plant species and soil type are also two key 
factors related to the plant response to metals (Horler et al., 1980). Therefore, it is necessary to further 
verify whether the spectrum of Seriphidium terrae-albae can be used to estimate the Au content in 
arid desert soils. To the best of our knowledge, this topic has not been addressed. 

Therefore, the first goal of this study is to determine whether abnormalities in the soil Au content 
can be used as a prospecting indicator for concealed Au deposits; if so, can they indicate the location 
of deep hidden ore deposits? The second purpose of this study is to determine whether the spectrum 
of a plant species (Seriphidium terrae-albae) widely distributed in the arid desert area can be used to 
estimate the Au contents in the soil with high accuracy. If these two tasks are proven feasible, it 
would suggest that plant spectra can be used to quickly identify the hidden Au deposits. This finding 
would be of considerable significance for expanding the prospecting space, improving the 
prospecting efficiency, and accelerating the pace of mineral exploration. 


2 Methods and materials 


2.4 Study area and experimental design 


The study area is located in the Aketasi mining region in arid areas of China, where the main mineral 
types are Au and Cu. The main soil type in the study area is aeolian gravel desert soil (Song et al., 
2017). Traditional geological prospecting methods cannot be effectively implemented in this area 
because quaternary deposits cover most of the area. By contrast, soil geochemical prospecting has 
notable potential in this area due to its deep penetration ability. 

The main plant species distributed in the study area is Seriphidium terrae-albae (as shown in Fig. 
1), which is a small half-shrub with small leaves and a relatively developed root system; it can grow 
in extremely dry and harsh environments because of its high resistance to drought, high temperature, 
sand damage, and poor soil. It is a small half-shrub that usually germinates in late March and grows 
rapidly, reaching a peak in mid-May. After a brief summer dormancy, it germinates in early August, 
blooms in mid-August, and seeds in September before ripening in late October. The plant species 
enters a drought stage in November and effectively makes it through the winter. 
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Fig. 1 Natural landscape of the study area 
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To determine whether the Au content in the soil can be estimated using Seriphidium terrae-albae 
plant spectra, we designed a 1-km-long sampling line with 48 sampling points at 20 m intervals (as 
shown in Fig. 2) in accordance with the locations of a known hidden Au deposit. The sampling line 
crossed the deposit in the north-south direction and crossed the upper part of the deposit and the 
background area. The geographic locations of the 48 sampling points were recorded using a handheld 
global positioning system (GPS). The purposes of this design are to ensure that the range of Au 
contents in the collected soil samples is wide and to obtain representative samples. Next, the plant 
and soil samples were collected along the sampling line. Healthy plants (Seriphidium terrae-albae) 
growing around the sampling points were sampled, and the weight of each sample was measured as 
200 g. The location of each soil sample was the same as that of a corresponding plant, and each soil 
sample was also weighed approximately 200 g. 
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Fig. 2 Demonstration of the locations of sampling points. Note that the numbers represent the sampling points. 
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2.20 Measurement of the Au content in the soil samples 


The collected soil samples were packaged in fresh-keeping bags and taken to the laboratory. They 
were dried, ground, and stored in sealed containers to measure the heavy metal content. Before 
element analysis, the soil samples were dissolved using the nitric acid-hydrogen peroxide digestion 
method, and then the Au content in the soil samples was measured with a ZEEnit650P atomic 
absorption spectrometer produced by Analytik Jena AG Company in Jena, Germany. The detection 
limit of the ZEEnit650P atomic absorption spectrometer is 0.2 ng/g. The statistical results for the Au 
content in the 48 soil samples collected are shown in Table 1. 


Table1 Statistics of the Au content in the 48 collected soil samples 


Element type Minimum value Maximum value Average value Standard deviation Coefficient of variation 
(ug/kg) (ug/kg) (ug/kg) (ug/kg) 
Au 0.3437 77.5711 6.0543 15.7080 2.5945 


2.3 Collection of plant spectra and preprocessing 


The canopy plant reflectance spectra were measured using an Analytical Spectral Device (ASD) 
Fieldspec3 portable spectrometer (Analytical Spectral Devices, Boulder, CO, USA) in clear and 
windless weather. To reduce the influence of frequent changes in the elevation angle of the sun on 
the spectral measurements, we selected the measurement period from 12:00 to 14:00 Beijing time on 
20 July 2016. During the measurement process, the spectrometer probe was placed vertically 
downward with a 25° field of view, and the distance between the probe and the plant canopy top was 
0.2 m. Five reflectance curves were first collected from plant samples at each sampling point; then, 
spectral curves that highly varied from the others were removed. Finally, the average value of the 
remaining spectral curves was taken as the final reflection spectrum of each plant sample at each 
sampling point. 

Reflectance spectra in the range of less than 400 nm and greater than 2400 nm were removed to 
reduce noise (Sun and Zhang, 2017). Reflectance spectra in the ranges of 1300—1400 and 1800-2000 
nm were also removed to eliminate the influence of atmospheric water vapor (Wang et al., 2015). 
Finally, each sample included 1700 bands. 
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The adjacent bands of hyperspectral data are highly correlated, leading to information redundancy. 
Therefore, in this study, the original spectra were resampled by calculating the average value of five 
adjacent bands. The reflectance spectrum of each plant sample was reduced to 340 bands after 
resampling. The reflectance spectra of the plant samples collected after pretreatment are shown in 
Figure 3. 
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Fig.3 Reflectance spectra ofthe collected 48 plant samples 


2.4 Partial least squares (PLS) method 


The PLS method is a multiple linear regression approach that combines the advantages of principal 
component analysis, typical correlation analysis, and multiple linear regression analysis (Han et al., 
2020). This method is especially suitable for modeling when there is multicollinearity between 
independent variables and the number of samples is less than the number of variables (Huang et al., 
2004). The PLS method can simplify the structure of hyperspectral data by effectively reducing the 
information redundancy and overcoming multiple correlation issues among bands (Jin and Wang, 
2019). Additionally, a hyperspectral inversion model established based on the PLS method can 
maximize the use of spectral information through principal component analysis to improve the 
modeling accuracy and estimates; therefore, the PLS method was chosen in this study. 

During the modeling process using the PLS method, the number of optimal principal components 
was determined using the coefficient of determination (R?) between the predicted value obtained 
through leave-one-out cross-validation method and the measured value. The number of principal 
components corresponding to the maximum R? value was selected as the optimal number of principal 
components. 

The leave-one-out cross-validated R?, the ratio of performance to deviation (RPD), and the root- 
mean-squared error (RMSE) were computed for model evaluation. In general, a good model should 
have high R?, high RPD, and low RMSE values. 


2.5 Spectral transformations 


Various spectral transformations, the original spectrum (R), logarithmic spectrum (InR), reciprocal 
spectrum (1/R), square root of the original spectrum (VR), reciprocal of the logarithmic spectrum 
(1/InR), and the corresponding first and second derivatives, were input into the PLS method as 
independent variables to construct a quantitative model for estimating the Au content in the soil. The 
sequence numbers of the 15 transformed spectra and the corresponding calculation methods are as 
follows: ST), original reflectance spectrum (R); ST», the first derivative of the original reflectance 
spectrum (R'); ST3, the second derivative of the original reflectance spectrum (R"); ST4, logarithm of 
the original spectrum (InR); STs, the first derivative of the logarithm of the original spectrum (InR)'; 
STe, the second derivative of the logarithm of the original spectrum (InR)"; ST, reciprocal of the 
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original spectrum (1/R); STs, the first derivative of the reciprocal of the original spectrum (1/R)'; STo, 
the second derivative of the reciprocal of the original spectrum (1/R)"; ST:o, square root of original 


spectrum (AR); STu1, the first derivative of the square root of the original spectrum (VR )'; STi, the 


second derivative of the square root of the original spectrum ( JR)"; STi, reciprocal of logarithmic 
spectrum (1/InR); ST14, the first derivative of reciprocal of logarithmic spectrum (1/InR)'; and STjs, 
the second derivative of reciprocal of logarithmic spectrum (1/InR)". 

Derivative spectroscopy has been increasingly used in spectroscopic analysis, and this study is no 
exception. Notably, in the derivatives of plant spectra, the influence of the soil background was 
effectively removed, and derivatives can be used to accurately extract physiological information from 
crop canopy spectra (Shi et al., 2014; Cui et al., 2018). In short, derivative spectroscopy can 
conveniently determine the characteristic parameters of a spectrum, such as the bending points and 
wavelength positions associated with the maximum and minimum reflectivity (Cui et al., 2018). 
Thus, subtle spectral changes in soil or plants caused by metal stress can be identified. We used the 
difference method to calculate the first and second derivatives because the collected plant spectra 
were discrete data. The formulas for calculating the first and second derivatives are shown in 
Equations 1 and 2, respectively. 


SAna) -S(À, ) 
SA = i+1 1 1 
(A) — 2M, ^ (1) 
S'A -S'A 
S"(A = il i-l 2 
(A) — 9A4 3 Q) 


where S(Ai+1) represents the value at wavelength A4;.; (nm) under different spectral transformations; 
S(A;1) represents the value at wavelength 4; ; (nm) under different spectral transformations; S'(A) 
represents the first derivative at wavelength 2 (nm) under different spectral transformations; S"(A) 
represents the second derivative at wavelength A (nm) under different spectral transformations; and 
AA (nm) represents the wavelength interval. 


2.6 Feature band selection method and evaluation of the constructed model 


The reflectance spectra of plants collected using the ASD spectrometer included a vast number of 
spectral bands, and different bands may make different contributions in the estimation of the Au 
content in the soil. The characteristic wavelength variables that significantly contribute to the Au 
content estimates were selected via various methods to reduce the modeling time and simplify the 
modeling process. The most important goal was to eliminate irrelevant variables and establish a 
robust and accurate quantitative model. 

In this study, four methods, including the genetic algorithm (GA), stepwise regression analysis 
(STE), competitive adaptive reweighted sampling (CARS), and correlation coefficient (CC) method, 
were used to select the characteristic bands from all spectral bands in the range of 400—2400 nm; the 
result was then input into the PLS method to construct an inversion model of the Au content in the 
soil. Because the number of samples was small, the R? between the predicted Au content obtained 
using leave-one-out cross-validation method and the measured Au content was used to evaluate the 
accuracy and stability of the constructed model. The larger the R? value, the greater the accuracy and 
stability of the constructed model. 

The GA is a stochastic global search and optimization algorithm based on natural selection and 
natural genetic mechanisms in the biological world (Hong et al., 2018). This algorithm can simulate 
natural selection and the phenomena of replication, crossover, and mutation in the genetic process. 
Through continuous genetic iterations, variables that increase the value of the objective function are 
retained, comparatively poor variables are eliminated, and globally optimal results are ultimately 
obtained (Jarvis and Goodacre, 2005). Additionally, a parallel search method that can improve the 
accuracy and stability of the analysis results is used. Therefore, in this study, the GA and partial least 
squares (GA-PLS) method were combined to estimate the Au content. First, the Au content in the 
soil was identified as the optimization objective, and plant spectral data were used as genes. Binary 
coding was then performed to randomly generate the initial population. The root-mean-squared error 
of cross-validation (RMSECV) was used as the fitness function, and the GA was used to optimize 
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the input characteristic bands. The selected characteristic bands were then entered into the PLS 
method to establish an estimation model of the Au content. In using the GA method for feature band 
screening, the population size was set to 30, the maximum number of iterations was 100, the 
crossover probability was 0.5, and the mutation probability was 0.01. 

The STE screening of characteristic bands involved first sequentially adding each spectral band to 
the statistical model and then performing an F test of the statistical model after each spectral band 
was introduced. The P-value of the F test was used for spectral band introduction and elimination 
(Jin and Wang, 2019). In this study, the maximum P-value for a spectral band to be included was 
defined as 0.05, and the minimum P-value for a spectral band to be removed was defined as 0.10. 

In this study, the CC method was also used to select characteristic bands. First, the coefficient and 
P-value between each band within the range of 400—2400 nm and the Au content in the soil were 
calculated, and then spectral bands with P-values less than 0.01 were screened and removed from the 
selected characteristic bands. 

The CARS algorithm simulates Darwin's "adaptive survival" principle (Li et al., 2009; Hong et 
al., 2018). In the process of characteristic band selection using this method, each wavelength variable 
was individually regarded, wavelength variables associated with regression coefficients with large 
absolute values in the PLS model were retained, and wavelengths associated with regression 
coefficients with small absolute values were eliminated by using an adaptive resampling weighting 
method and an exponential decay function. In this way, many subsets of spectral variables can be 
screened, and the optimal spectral variable subset can be selected on the basis of the RMSECV 
obtained using cross-validation. The subset corresponding to the minimum RMSECV was chosen as 
the optimal variable subset. In this study, the Monte Carlo sampling frequency was defined as 50. 


3 Results 


3.1 Analysis and evaluation of soil geochemical characteristics 


From Figure 4 it can be seen that, there was an M-shaped soil Au content anomaly in the area 
corresponding to sampling points 25-33. Furthermore, Figure 5 shows that the soil Au content in this 
area was significantly higher than that in other areas (P«0.001), which indicates that abnormal soil 
Au contents may be present. Moreover, concealed gold mineralization zones have been found below 
these areas, and the drilling data indicate a tenor of ore in the range from 1.5 to 4.0 g/t (Song et al., 
2017). The above results imply that the abnormal soil Au contents can potentially indicate the 
locations of hidden Au deposits. Therefore, using the proposed soil geochemical method for mineral 
exploration in this area is feasible and effective. However, it is time consuming and laborious to use 
traditional methods to measure the Au content in the soil. Therefore, in later sections, we used 
hyperspectral technology to quickly estimate the Au content in the soil. 


3.2 Influence of spectral transformation on the estimation of the Au content 


Table 2 shows that when using all bands in the range of 400—2400 nm in Seriphidium terrae-albae 
plant spectra and the PLS method to construct the model for estimating the Au content in the soil, 
there were significant differences in the estimation accuracies of the model for different forms of the 
transformed spectra. The R between the predicted Au content obtained using the leave-one-out cross- 
validation method and the measured Au content in the soil varied widely from 0.1785 to 0.8218. The 
RMSE and RPD values ofthe estimation models constructed using different forms of the transformed 
spectra also considerably varied. The lowest RMSE value was 6.56 pg/kg, and the highest was 14.53 
ug/kg; additionally, the lowest RPD value was 1.07, and the highest was 2.37. These results indicate 
that it is crucial to choose the appropriate spectral transformation when establishing a model for 
estimating the Au content in the soil using plant spectra. 

Table 2 also indicates that the Au content estimation model obtained by inputting five spectra STi, 
ST4, ST7, ST10, and ST13 as independent variables into the PLS method is comparatively inaccurate, 
with maximum R? and RPD values of 0.4451 and 1.31, respectively; the lowest RMSE was as high 
as 11.87 pg/kg. These results imply that some types of transform spectra are not suitable for 
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Fig. 5 Number and average Au content of soil samples grown in mineralized and non-mineralized altered zones. 
Note that boxes represent interquartile ranges (25" to 75" percentiles); thick horizontal bars in each box denotes the 
median (50" percentile); whiskers (thin horizontal bars) represent the highest and the lowest values, respectively; red 
and green rectangles denote the average value; and red and green dots represent the Au content of each soil sample in 
the mineralized altered zone and non-mineralized altered zone, respectively. 


estimating the Au content in the soil. By contrast, the accuracy of the estimation model of the soil Au 
content constructed by inputting the first and second derivatives of these five types of transformed 
spectra as independent variables into the PLS method was significantly increased. The lowest R? and 
RPD values were 0.2974 and 1.07, respectively, and the highest values even reached 0.8218 and 2.37, 
respectively. Compared with those of the original spectra, the R? and RPD values of the model 
constructed based on derivative spectra were greatly improved, with increasing ranges of 16.96%- 
235.32% and —0.926%-80.916%, respectively. Finally, the results indicated that the STy4 
((1/InR)') index provides the best spectral transformation. The modeling accuracy in estimating the 
Au content was highest when using this type of spectral transformation compared to the 14 other 
types of transformed spectra selected in this study. 

Figure 6 shows the scatter diagram of the predicted soil Au content obtained using the leave-one- 
out cross-validated method versus measured soil Au content. Notably, when the Au content in the 
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Fig. 6 Scatter diagram of the predicted soil Au content obtained using the leave-one-out cross-validated method 
versus measured soil Au content. R?, coefficient of determination; RMSE, root-mean-squared error; RPD, ratio of 
performance to deviation; 1/InR, reciprocal of logarithmic spectrum; ', first derivative. 


soil was greater than 5 ug/kg, the prediction effect was better, and the predicted and measured values 
were evenly distributed on both sides of the 1:1 line. By contrast, when the Au content was less 
than 5 pg/kg, the prediction effect was poor, and the predicted Au contents of some samples were 
negative. 


3.3 Influence of the band selection method on the Au content estimations 


To study the effects of different band selection methods on the estimation of the Au content in the 
soil, we first selected the characteristic bands of 15 types of different transformed spectra using the 
GA, CARS, STE, and CC methods. Then, we input the selected characteristic bands into the PLS 
model to build an estimation model of the Au content in the soil; constructed 15 estimation models 
for each band selection method, and calculated the R2, RMSE, and RPD between the predicted value 
obtained using the leave-one-out cross-validation method and measured Au content (see Table 2). 
Finally, we used the average values of the R2, RMSE, and RPD for the 15 models, which were 
respectively defined as the M1, M2, and M3 values, as indicators to evaluate the stability and 
generalization ability of characteristic band selection with different methods. The larger the values 
of M1 and M3, the smaller the M2 value, and the higher the stability and accuracy of the band 
selection method. 

Figure 7 shows that the M1 and M3 values of the GA method were slightly higher than those of 
the full spectrum method, and the M2 value of the GA method was slightly lower than that of the 
full spectrum method. Additionally, the M1 and M3 values ofthe CARS method were slightly lower 
than the full spectrum method, and the M2 value of the CARS method was slightly higher than 
those of the full spectrum method. The M1 and M3 values of the STE and CC methods were 
considerably lower than those of the full spectrum method, and the M2 value was significantly 
higher than that of the full spectrum method. If deciding among methods based only on the MI, 
M2, and M3 values, the GA was the best band selection method. However, in the process of 
modeling, the number of characteristic bands used was also a key factor that should be considered. 
Using too many characteristic bands can increase the complexity of the model and reduce its anti- 
interference capabilities. Therefore, when determining the optimal band selection method, the 
accuracy and complexity of the constructed model need to be comprehensively considered. Thus, 
the average numbers of characteristic bands in the 15 estimation models constructed based on four 
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band selection methods were also calculated. Figure 7b indicates that the number of characteristic 
bands selected by the GA method was the largest, reaching as high as 147, or accounting for 43.24% 
of the 340 total bands; the CC method used the second-largest number of bands. However, the 
numbers of characteristic bands used in the STE and CARS methods were the lowest, at 14 and 96, 
respectively, accounting for 4.12% and 15.00% of the 340 total bands. Although the M1 and M3 
values of the GA method were slightly higher than those of the CARS method and the M2 value 
was slightly lower, the number of characteristic bands selected by the GA method was also far 
greater than that selected by the CARS method. Considering the accuracy and complexity of the 
model, CARS is the optimal band selection method for the estimation of the Au content in the soil 
using Seriphidium terrae-albae plant spectra. 

Figure 7b also suggests that three of the band selection methods, including the GA, CARS, and 
CC methods, can greatly reduce the number of bands used in the constructed model without greatly 
reducing the Au content estimation accuracy. The bands used in the models were only in the range 
of 400-2400 nm, accounting for 15.0095—43.2496 of the total number of possible bands. Thus, 
filtering for noise may greatly reduce the complexity of the model. Nevertheless, the selected 
characteristic bands can be used to replace all bands in constructing an estimation model of the Au 
content in the soil. 
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Fig.7 M1 and M3 values of the different characteristic band screening methods (a), and M2 value of the different 
characteristic band screening methods and the numbers of characteristic bands in the models (b). M1, the mean 
value of the coefficient of determination (R?) obtained by the leave-one-out cross-validation of 15 estimation models 
based on different transform spectra; M2, the mean value of the root-mean-squared error (RMSE) obtained by the 
leave-one-out cross-validation of 15 estimation models based on different transform spectra; M3, the mean value of 
the ratio of performance to deviation (RPD) obtained by the leave-one-out cross-validation of 15 estimation models 
based on different transform spectra; GA, genetic algorithm; STE, stepwise regression analysis; CARS, competitive 
adaptive reweighted sampling; CC, correlation coefficient method. 


3.4 Optimal estimation model of the Au content in the soil 


Table 2 indicates that the highest accuracy was obtained when the characteristic bands were selected 
from the transformed spectrum (1/InR)' of Seriphidium terrae-albae using the CARS method and 
were input into the PLS method to construct an inversion model of the Au content in the soil. The 
R?, RMSE, and RPD values obtained using leave-one-out cross-validation method constructed by 
15 different transformed spectra for the model based on the CARS method were 0.8016, 7.07 ug/kg, 
and 2.20, respectively, and the number and distribution of characteristic bands included in the model 
are shown in Table 3. The number of characteristic bands was 56, accounting for only 16.50% of 
all bands in the range of 400—2400 nm. Compared with those of the model based on the full 
spectrum, the R? and RPD values of the model based on the characteristic band selected by the 
CARS method did not decrease significantly (0.8016 (model based on the CARS method) vs 0.8218 
(model based on the full spectrum) for R?, and 2.20 (model based on the CARS method) vs 2.37 
(model based on the full spectrum) for RPD), and the RMSE did not significantly increase (7.07 vs 
6.56 ug/kg); however, the number of bands used in the model was greatly reduced (56 for the model 
based on the CARS method vs 340 for the model based on the full spectrum), indicating that this 
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simplified model constructed based on the characteristic bands selected by the CARS method 
provided high accuracy, stability and generalization ability. This model is the best for the inversion 
of the Au content in the soil using the Seriphidium terrae-albae plantspectra. 

Table 3 shows the distribution of characteristic bands in the spectral transformation (1/InR)' 
screened with the CARS method. The selected characteristic bands were distributed in the visible- 
light, near-infrared, and short-wave infrared ranges. Among them, some of the bands that have been 
verified to be highly correlated with metal stress by other scholars (Hoque and Huntzler, 1992; Yang 
et al., 2016; Zhang et al., 2020) were screened and removed, including a band within the green peak 
range (542 nm), a band within the red valley range (652 nm), two bands within the red edge (692 
and 772 nm), and a band within the water vapor absorption range (1457 nm). The above result 
suggests that the Au content estimation model constructed based on the characteristic bands selected 
by the CARS method has a strong physical significance. 


Table 3 Distribution of characteristic bands in the spectral transformations (STi4 (1/InR)') selected by the 
competitive adaptive reweighted sampling (CARS) method 
Number of 


characteristic Band distribution range (nm) 
bands 


412-427, 517, 542, 577, 652, 692, 772, 842, 922, 947, 1032, 1047—1052, 1092, 1127, 1147, 1167-1172, 
56 1197, 1212, 1402, 1427, 1457, 1487, 1622-1627, 1652, 1672, 1692-1697, 1702, 1712-1727, 1737, 1782, 
1792, 2022, 2042, 2127, 2147-2152, 2167, 2197, 2242, 2257, 2292-2297, 2322-2327, 2342, 2352 


4 Discussion 


The emergence of hyperspectral technology is an important milestone in the development of remote 
sensing technology. Hyperspectral technology can be used to quickly detect tiny changes in plant 
spectra associated with metal stress, thus potentially providing quick and nondestructive estimates 
of the metal content. Currently, most scholars have used physical and statistical models to estimate 
the metal content (Liu et al., 2011, 2018; Hede et al., 2015; Zhang et al., 2017a, b, 2019; Zhang et 
al., 2018). Compared with physical models, statistical models have the advantages of being fast and 
simple; therefore, they have been widely used. However, although improved spectral resolutions 
can considerably enhance the amount of information and aid in the identification of characteristic 
parameters related to the degree of metal stress, the effects of interference factors such as soil 
background, canopy structure, and atmosphere make the canopy reflection spectra of plants 
extremely complex and variable (Cui et al., 2018). Additionally, statistical models could construct 
the quantitative relationship between the obtained plant spectrum (independent variable) and the 
metal content (dependent variable). A subtle error in the spectrum will often greatly affect the 
accuracy of the metal content estimates (Shi et al., 2016b; Wang et al., 2018). Therefore, eliminating 
background and noise effects while extracting useful spectral data has become a current research 
hotspot (Wang et al., 2018). 

In this study, the Au content in the soil was estimated using plant spectra, and then 15 different 
forms of spectral transformation were explored to find the optimal spectral transformation form. 
The results indicate that the inversion model constructed using derivative spectroscopy provides a 
high accuracy. The reason for this phenomenon may be that the canopy reflectance spectrum of the 
plant canopy, rather than that of plant leaves, was measured in this study. Hyperspectral sensors are 
commonly mounted on aerospace and aviation platforms to vertically observe the ground and obtain 
the canopy spectra of plants. However, Seriphidium terrae-albae is a small shrub, and the leaves 
are small; thus, the canopy is not completely covered by leaves, which causes the canopy spectrum 
to contain noise, such as soil background information. In addition, canopy structure parameters, 
such as the leaf inclination of a plant, will affect the canopy reflectance spectrum. Due to the 
existence of this interference, some spectral information that contributes to the inversion of the Au 
content may be hidden. Thus, after the first-order and second-order derivatives are obtained, the 
soil background information and some random noise can be eliminated to some extent 
(Demetriades-Shah et al., 1990; Philpot, 1991; Smith et al., 2004). Moreover, the derivative spectra 
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can be used to extract the inflection point and slope information hidden in the plant spectra, amplify 
the differences between the spectra to a certain extent, and improve the inversion accuracy of the 
corresponding model (Cui et al., 2018). 

It was found that the first derivative of reciprocal of logarithmic spectrum ((1/InR)') is the best 
spectral transformation among the 15 spectral transform methods considered in this study. Notably, 
the plant canopy reflectance spectrum contains not only multiplicative noise but also additive noise. 
Calculating the reciprocal of the logarithm of the original spectrum can reduce the influence of 
multiplicative noise caused by the changes in light conditions to some extent (Bhargava and 
Mariam, 1992; Gong et al., 2001; Wang et al., 2011); afterwards, determining the first derivative 
can reduce the influence of additive noise (Zhang et al., 1997). Specifically, the spectral 
transformation (1/InR)' not only reduces the effect of multiplicative noise but also decreases the 
effect of additive noise, allowing many tiny changes in the plant spectrum caused by the metal stress 
to be extracted. Consequently, among all the models considered, the metal content estimation model 
constructed with this transformed spectrum displays the highest accuracy. 


5 Conclusions 


In this study, the spatial coupling relationship between the Au content in the soil and concealed Au 
ore bodies was first analyzed, and the feasibility and effectiveness of estimating the Au content in 
the soil by using the spectra of plants in the visible, near-infrared, and short-wave infrared ranges 
were discussed. The results indicate that the Au content in the soil above an ore body is much higher 
than that in other areas, and the abnormal Au content in the soil corresponds well with the locations 
of ore bodies. Au content anomalies in the soil can be used as markers for the delineation of 
metallogenic targets. Then, the Au content in the soil can be accurately estimated by establishing a 
predictive model using the characteristic bands selected from plant spectra. However, in the 
modeling process, it is crucial to select the appropriate transform spectrum and band selection 
method. Among them, the PLS model based on the transformed spectrum (1/InR)' and the CARS 
method perform well in estimating the Au content in the soil, with the 22, RMSE, and RPD between 
the predicted value obtained by leave-one-out cross-validation method and measured value reaching 
0.8016, 7.07 ug/kg, and 2.20, respectively. Overall, this study proposes a method to identify the 
concealed Au deposits by using plant spectra; this approach may have promising application 
prospects because of its simplicity and speed. 

The greatest advantages of the proposed method are its simplicity and speed. Our research group 
has further constructed a "super low-altitude detection platform" that can obtain observations at the 
"submeter level" based on an airborne X 1-912 power suspension glider and a HySpex hyperspectral 
sensor. Therefore, our next research goal is to apply the model constructed in this study in 
conjunction with the ultralow-altitude detection platform to determine whether the model is still 
valid. If feasible, the Au content in the soil could be quickly estimated over large areas, and 
metallogenic target areas could be remotely delineated. These advantages could greatly improve the 
efficiency of prospecting. 
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